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ABSTRACT 

The ability of naive listener- judges to recognize the 
affective state of a speaker on the basis of nonlinguistic auditory 
cues independent of the verbal content of an utterance has been well 
established by a large number of studies. This study used artificial 
stimuli produced by a Moog synthesizer to vary pitch level and 
variation, amplitude level and variation, and signal duration and 
speed (tempo) systematically in a factorial design. The stimuli used, 
raters employed, procedure, and results are presented for two studies 
which were conducted. The results supported the contention that the 
attribution of emotional meaning from auditory stimuli is based on 
characteristic patterns of accoustic cues. This study suggested a 
rapproachement between studies on emotional expression in speech and 
the psychological investigation of emotion in music, with interesting 
implications concerning speculations on the common origin of music 
and speech in primitive emotional displays of our prehistoric 
ancestors . (Author/BW) 
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Abstract 



Electronically synthesized tone sequences with systematic 
variations of pitch, amplitude, and tempo were rated on 
emotional e:<pressiveness. The results support the contention 
that dimensions of emotional meaning are communicated by 
specific patterns of acoustic cues. Implications concerning 
xinleamed neural programs of emotioned esipression in speech 
and music are discussed. 
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Acoustic concomitants of emotional dimensions: 

Judging affect from synthesized tone sequences 

Problem : The ability of naive listener-judges to recognize the affective 

state of a speaker on the basis of nonlinguistic auditory cues independent 
of the verbal content of an utterance has been well established by a 
large number of studies, summarized by Kramer (1963) , Davit z (1964 ) , 

Vetter (1969), and Scherer (1970). Results of a recent study by Scherer, 
Rosenthal ^and Koivumaki (1971), using content-masking by r’cjidomspl icing 
(Scherer, 1971), electronic content filtering (Rogers, Scherer, and Rosenthal, 
1971) and their combinations, suggest that a minimal set of vocal cues 
consisting of pitch level and variation, amplitude level and variation, 
and rate of articulation or tempo may be sufficient to communicate the 
evaluation, potency, and activity dimensions of emotional meaning. 

In order to assess more precisely the way ir: which inferences of 
emotional content are based on specific acoustic cues and their combinations, 
one would want to be able to manipulate these cues e:^erimentally. Since, 
in spite of recent advances in the area of speech synthesis, this is 
rather difficult to achieve with actual speech signals, the present study 
has used artificial stimuli produced by a Moog synthesizer to vary pitch 
level and variation, amplitude level and variation, and signal duration 
and speed (tempo) systematically in a factorial design. 

Study I 

Stimuli : A sinple tone sequence modeled after the intonation contour of 

a short sentence, consisting of eight sine wave tones of differential pitch 
and duration, were synthesized repeatedly on a Moog electronic synthesizer 
with sequencing unit. Five parameters of the sequence were varied 
independently in a 4x2x2x2x2 factorial design with the following levels 
on each parameter: pitch variation - moderate, extreme, up contoiur, down 
contour; amplitude variation - moderate, extreme; pitch level-high, low; 
amplitude level-low, high; ten $«3 - slow, fast. The resulting 64 stimuli, 
rendered two times each, were edited in random order on to a demonstration 
tape. 
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Raters ; Ten undergraduates, six male and four female, were used as 
raters. They were recruited by sign-up sheets and were paid. 

Procedure ; The raters heard the tape-recorded stimuli in random order and 
were asked to rate each sample on ten-point scales of pleasantness, 
evaluation, activity, and potency as well as to indicate whether the sample 
to be rated could or could not be an expression of the following emotions: 
interest, sadness, fear, happiness, disgust, anger, surprise, elation, 
boredom. 

Results ; Table 1 shows F-ratios, significance levels, and the direction 
of the effect .for main effects and two-way interactions with p < .01 yielded 
by a five-way analysis of variance with repeated measures. The parameters 
that seem to have had the most influence on !;he judges’ ratings are tempo 
and pitch variation. Moderate pitch Vcuriation leads to ratings of generally 
unpleasant emotions, like sadness, fear, disgust, and boredom, showing 
little activity or potency. Extreme pitch variation and up contours 
produce ratings of highly pleasant, active, and potent emotions such as 
happiness, interest, surprise, and also fear. Down contours have similar 
effects but do not seem to contain elements of surprise or uncertainty. 

Past tempo leads to an attribution of high activity and potency as in the 
emotions of interest, fear, happiness, anger, and surprise. Slow tempo 
is seen as indicative of sadness, disgust, and boredom. 

Extreme amplitude Vciriation is seen as active and potent, mostly 
indicative of the emotions of fear and anger, whereas moderate amplitude 
variation is seen as happiness or disgust. High pitch level yields 
happiness and surprise, low pitch level, on the other hand, leads to ratings 
of disgust and boredom. High amplitude level leads to ratings of potency. 

There is some evidence for differential acoustic manifestations of 
different types of specific emotions. For example, whereas anger is generally 
characterized by extreme amplitude variation and fast teirpo, which may 
represent ’hot" anger, a significant Interaction effect shows that moderate 
pitch variation and moderate amplitude variation interact to produce higher 
ratings on einger, possibly indicative of "cool" anger. Another interesting 
interaction effect, that leads to consistently higher ratings on activity 
and surprise, usually associated with up contours, occurs between down 
contour and high pitch level which may represent a special type of novel 
situation. 



study II 

Stimuli ; 16 of the 64 stimuli used in Study I were chosen to represent 
happiness, fear, anger, and sadness. 

Raters ; 166 undergraduates, 69 male and 97 female, rated the stimuli 
during a demonstration in class. 

Procedure ; The raters were asked to choose between a pair of alternative 
labels for each of the 16 stimuli. The ''correct" Aabel was determined by 
the highest mean rating of the respective stimulus in Study I. 

Results ; The frequency distribution of the raters over the number of 
correct choices in shown in the following table; 



Number 


of 


correct choices 


1-7 


8 


9 


10 1 


11 j 


12 I 


13 ; 


14 i 


15 

1 


( 16 1 Total 

1 ' 

1 ■ - ■■■■■■■. 1 


Number 


of 


raters 


0 


3 


8 ! 


15 ! 

1 


21 i 




39 j 


36 t 


8 i 


1 1 166 



There were no significant differences in accuracy between male and female 

raters. The degree of accuracy shown by the judges is far above of what 

3 

may be expected by chance (p < .001) . Furthermore, most of the errors 

.4 

made are due to inaccurate choices on 4 of the 16 stimuli , the error 

5 

distribution being significantly different from chance (p < .001) . 

Conclusion ; These results support the contention that the attribution of 
emotional meaning from auditory stimuli is based on characteristic patterns 
of acoustic cues. Specifically, there is evidence for earlier suggestions 
(Scherer, 1971; Scherer, Rosenthal, and Koivumaki, 1971) that specific cues 
or cue combinations communicate the major dimensions of emotional meaning. 
Relationships have been found between amplitude level and the potency 
dimension, between variation of pitch and amplitude as well as tempo and 
the activity and potency dimensions, and between pitch level and variation 
and the evaluative dimension. 

The present approach suggests a rapprochement between studies on 
emotional expression in speech and the psychological investigation of emotion 
in music, with interesting implications ci.mcerning speculations on the common 
origin of music and speech in primitive emotional displaje of our prehistoric 
ancestors (Langer , 1942) . Pertinent studies on the cross-cultural 
universality of the vocal expression of emotion as well as on the development 
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of the ability to recognize emotions from vocal or musical material in 
young children seem promising and have yet to be done. Judging from 
recent evidence (Ekman eind Friesen, 1971) supporting Darwin's theory of 
innate mechanisms in emotional expression (Darwin, 1887) , one may be 
justified in speculating about the existence of unlearned netural programs 
for the vocal expression and recognition of emotion, especially given 
the strong correspondences between respiratory phenomena and physiological 
correlates of affective state. This line of reasoning might eventually 
lead to a comparative analysis of the vocal expression of emotion in 
humans and auditory signals found in primate communication. 
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Footnotes 



^The author expresses his gratitude to Martin Yaffee and Paul Leinian for 
help in the preparation of the synthesized stimuli. The data aneilysis 
was partially supported by a research grant (GS-2654) to Robert Rosenthal 
(Hcirvard University) who has contributed helpful comments. The study 
has been supported by an NSF institutional grant to the author's insti- 
tution. 
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After the present study was completed, the author was made aware of an 
experiment showing that pleasantness ratings of tone sequences bear a 
cuirvilinear relationship to the amount of stimulus variation, with moderate 
variation being perceived as most pleasant. (P.C. Vitz. Affect as a 
function of stimulus variation. Jovurnal of Experimented Psychology , 1966, 
71 , 74-79) . It is likely that extreme pitch variation in the present 
study corresponds to moderate variation in the former. ^ 

3 

Chi square test of goodness of fit to normal distribution. 

4 ■ 

The reason for the much more frequent errors on these stimuli can be found 
in the fact that the mean difference in the ratings for both alternatives 
in Study I are much lower than for the rest of the stimuli. A correlation 
between number of errors and mean difference between alternatives for each 
stimulus yielded r = .40, p < .10, N » 16, one-tailed. 

^Kolmogorov-Smirnov test of goodness of fit. 
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Table 1 



^Acoustic ^ 
J>arameter 



F-ratiosy significance levels, and direction of means'‘ 



Emotion^ 


^ PV 


AV ^ 


i PL 


, AL 1 TE ! 


interaction 


Pleasantness 


5.33** 
Ex, Down 


1.81 


< 1 


I : 

! < 1 : 

i 1 

t 


2.05 


11.26** 
LoAL + HiPL 
HiAL + LoPL 


Activity 


9.94*** 
Ex, Up, Down 


1 

5.98** 

Ex 


4.23 


8.73* 

Hi 


35.48*** 

Fast 


9.21** 
MoPV + LoPL 
DoPV + HiPL 


Potency 


23.46*** 
Ex, Up, Down 


22.03** 

Ex 


1.14 


10.44* 

Hi 


5.48* 

Fast 


- 


Interest 


4.72** 

Ex,Up,Dowil| 


< 1 


2.45 


4.95 


23.63*** 
Fast ! 


- 


Sadness 


4.27** 

Mo 

I 


2.82 

1 

1 

1 


3.19 


3.49 


115.20*** 

Slow 


13.97** 
MoAV + HiPl 
1 ExAV + LoPL 


Fear 


3.71* 
Mo, Ex, Up 


i 

1 6.32* 
i Ex 


< 1 


1.12 


11.05** 

Fast 


“ 


Happiness 


8.26*** 
Ex, Up, Down 


1 

7.17* 

Mo 


9.38* 

Hi 


1 

< 1 


i 

33.30*** 

Fast 


5.12** 

ExUpPV + Fast 


Disgust 


5.62*** 

Mo ! 


22.50** 
! Mo 


6.43* 

Lo 


< 1 


6.37* 

Slow 


- 


Anger 


1 

1.22 


1 

: 6.70* 
Ex 


< 1 


3.84 


7.43* 

' Fast j 


1 4.83** 

; MoPV + MoAV 


Surprise 


9.81*** 
Ex, Up 


1.77 


■ i II 

12.62** 

Hi 


1 

2.72 


1 1 

1 45.20*** ! 

1 Fast i 


! 7.38*** 

j ExDoPV + HiPl 


Ealtion 


2.49 


1.87 


2.60 


1 

< 1 


I 

3.16 i 




Boredom 


5.59** 

Mo 


< 1 


5.50* I 

Lo I 


: < 1 


60.19*** 

Slow 


- 



^Higher ratings were found for the level of each peirameter shown in the cell 
Abbreviations: 

PV = pitch variation, AV * aniplitude variation, PL = pitch level, AL = amplitude 
level, TE = tempo. Mo = moderate. Ex = extreme 

*p < .05, **p < .01, ***p < .001 
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